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Abstract. Many real-world networks exhibit correlations between the node 
degrees. For instance, in social networks nodes tend to connect to nodes of 
similar degree and conversely, in biological and technological networks, high- 
degree nodes tend to be linked with low-degree nodes. Degree correlations also 
affect the dynamics of processes supported by a network structure, such as the 
spread of opinions or epidemics. The proper modelling of these systems, i.e., 
without uncontrolled biases, requires the sampling of networks with a specified 
set of constraints. We present a solution to the sampling problem when the 
constraints imposed are the degree correlations. In particular, we develop an 
exact method to construct and sample graphs with a specified joint-degree matrix, 
which is a matrix providing the number of edges between all the sets of nodes of 
a given degree, for all degrees, thus completely specifying all pairwise degree 
correlations, and additionally, the degree sequence itself. Our algorithm always 
produces independent samples without backtracking. The complexity of the graph 
construction algorithm is 0[NM) where N is the number of nodes and M is the 
number of edges. 
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1. Introduction 

Complex systems often consist of a discrete set of elements with heterogeneous pairwise 
interactions. Networks, or graphs have proven to be a useful representational paradigm 
for the study of these systems E H li. The nodes, or vertices, of the graphs 
represent the discrete elements, and the edges, or links, represent their interaction. In 
empirical studies of real-world systems, however, for reasons of methodology, privacy, 
or simply lack of data, frequently there is only limited information available about 
the connectivity structure of a network. When this is the case, one has to take a 
statistical approach and study ensembles of graphs that conform to some structural 
constraints. This statistical approach enables the computation of ensemble averages 
of network observables as determined solely by the constraints, i.e., by the specified 
structural properties of the graphs. Ensemble modeling of this type is necessary to 
determine the relationship between the given structural constraints and the behavior 
of the complex system as a whole. Calculating ensemble averages, though, requires 
the ability to construct all the graphs that are consistent with the required structural 
constraints, a highly non-trivial problem. 

Perhaps one of the simplest examples of structural constraints that occur in 
data-driven studies of real-world systems is to fix the degree of each node, which 
is the number of edges that are connected to, or are incident on the node. For an 
undirected graph with N nodes this information is specified by a degree sequence 
T) = {di, (i 2 , • • •, where di is the degree of node i. Similarly, for a directed 
graph, a hi-degree sequence (BDS) V = {(dj", , (d^, dj) , • • •, } specifies 

the number of incoming and outgoing edges for each node where d~ denotes the in¬ 
degree, and d^ the out-degree, of node i. The situation of most practical interest 
is when we demand the graph with a given degree sequence to be a simple graph, 
which has the additional constraints that there can be at most one link (in each 
direction, if directed) between any two nodes, and that no link starts and ends on 
the same node (no self-loops). However, not all positive integer sequences can serve 
as the sequence of the degrees of some simple graph. If such a graph does exist, 
then the sequence is said to be graphical. Any simple graph (just “graph” from here 
on) with the prescribed node degrees is said to realize the degree sequence, and it 
is called a graphical realization of the sequence. The two main results used to test 
the graphicality of an undirected degree sequence are the Erdos-Gallai theorem E 
and the Havel-Hakimi theorem E E. For directed networks, instead, the main 
theorem characterizing the graphicality of a BDS is due to Fulkerson [S]- More 
recently, exploiting a formulation based on recurrence relations, new methods were 
introduced to implement these tests with a worst case computational complexity that 
is only linear in the number of nodes Eiiniiii]. The advantage of these methods over 
others with similar complexity [H] is that they also allow a straightforward algorithmic 
implementation. 

While the above results provide complete and practical answers to the question 
of the graphicality of sequences of integers, they do not suffice to solve the problem of 
constructing graphs with prescribed degrees. One of the main issues with constructing 
graphs for the purpose of ensemble modeling is that, except for networks of just a few 
nodes, the number of graphs realizing a degree sequence, or other possible constraints, 
is generally so large that their complete enumeration is impractical. Therefore, one 
has to resort to sampling the space of realizations by randomly generating networks 
with prescribed node degrees mm- For the case of degree-based graph sampling. 
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the existing approaches generally fall into two classes that can broadly be referred 
to as “rewiring” and “stub-matching”. Rewiring methods start from a graph with 
the required degrees and use Markov chain Monte Carlo (MCMC) schemes to swap 
repeatedly the ends of pairs of edges to produce new graphs with the same degree 
sequence m m usi [ig. Stub-matching methods, instead, are direct construction 
algorithms that build the graphs by sequentially creating the edges via the joining 
of two stubs of two nodes diiiiiiiiniEniET]. A stub represents a non-connected, 
“dangling half-edge” and a node has as many stubs as its degree. Unfortunately, these 
techniques can provide biased results, or are ill-controlled. In the case of the MCMC 
method the mixing time is in general unknown and thus one cannot know a priori 
the number of swaps needed to produce two statistically independent samples. Proofs 
showing polynomial mixing of the MCMC method have been recently developed for 
special degree sequences m mill US, and for the case of balanced realizations of 
joint-degree matrices |26| . However, none of these methods allows the determination 
of the exponent of the polynomial scaling. 

Among the stub-matching methods, the most commonly used algorithm, which 
is also ill controlled, is known as the configuration model. The configuration model 
was proposed in HZ] as an algorithmic equivalent of the results from Refs. Ezim, 
themselves based on prior models jHISO]- The algorithm randomly extracts two stubs 
from the set of all stubs not yet connected into edges, and connects them into an edge. 
If a multi-edge or a self-loop has just been created, the process is restarted from the 
very beginning to avoid biases. However, depending on the degree sequence, this 
process can become very inefficient with an uncontrolled running time, just like the 
MCMC method. Alternatively, one can ignore multi-edges and self-loops, and fix them 
“by hand” at the end of the process. However, doing so produces significant biases 
even in the limit of large system size m- Recently, a novel family of stub-matching 
algorithms were introduced for both undirected |g and directed la degree sequences 
(reproduced here in Appendix A I, based on the so-called star-constrained graphicality 
theorems |32[ 155] . These algorithms generate statistically independent samples with 
a worst case polynomial time of 0{NM), where M is the total number of edges. The 
samples are not generated uniformly. However, their statistical weights are computable 
and can be used to obtain results in an importance sampling framework H [Ml HU ESI- 
Note that the solution for the directed sequences also solves the problem for bipartite 
sequences because a bipartite graph can always be represented as a directed one in 
which one of the two sets of nodes has only outgoing edges, and the other set has only 
incoming ones. 

Graph construction and sampling becomes even more difficult when there are 
structural constraints of higher order, such as correlations amongst the node degrees. 
Degree correlations can be expressed in several ways, for example with the help of 
the conditional probability P{d'\d) that a node of degree d will have a neighbor of 
degree d', or more simply, by the average degree of the neighbors of a node with 
degree d, d'{d) — J2d' d'P{d'\d) [Mj- The properties of d'{d) characterize the so-called 
assortativity of a graph, which is a measure of the tendency of a node to connect to 
nodes of similar degree. If d'{d) is increasing in d, the graph is degree assortative, 
if it is decreasing the graph is degree disassortative, and if it is constant, the graph 
is degree uncorrelated. Even more coarse-grained measures of degree correlations are 
possible, including the Pearson coefficient m, the Spearman coefficient [35] and the 
Kendall coefficient |M]- These coefficients assume values ranging from —1, for highly 
disassortative graphs, to 1, for highly assortative ones. 
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A more precise way to express degree correlations is via the use of a joint- 
degree matrix. The joint-degree matrix (JDM) of a given undirected simple graph 
is a symmetric matrix whose (a, j5) element is the number of edges between nodes 
of degree a and nodes of degree /3. The dimensions of the JDM are A x A, where 
A is the largest degree of a node in the graph. The degree correlation measures 
discussed above specify the correlations only statistically, but they do not fix the 
number of edges between nodes of given degrees, whereas the joint-degree matrices 
do. In this sense, the relationship between joint-degree matrices and the statistical 
degree correlation measures is similar to the relationship between degree sequences 
and degree distributions. 

Degree correlations have generated considerable interest, as they are known to 
affect many structural and dynamical properties of graphs and the processes they 
support gniiiDiiiiisiiiiiiaiisiiiTi. Nevertheless, even though their importance is 
well established, it has heretofore not been possible to perform ensemble modeling of 
graphs with prescribed joint-degree matrices. In this Article, we solve this problem by 
developing an algorithm based on the stub-matching method to construct and sample 
ensembles of graphs with a specified joint-degree matrix. 


2. Mathematical foundations 


2.1. Graphicality of JDMs 


The problem of graphicality for JDMs asks whether a specified symmetric matrix can 
be the JDM of a simple graph. Our starting point is an Erdds-Gallai-like theorem 
that gives the requiements for a JDM to be graphical |48l |49l [50] . 

Before stating the theorem, though, note that a JDM specifies uniquely the degree 
sequence of the graphs that realize it |1S] . Given a JDM J, the number of nodes with 
degree a is 


IK.| = 


J OlO. “I” 


a 


A 

>a«/3 

/3=1 


where Va is the set of nodes, or degree class, with degree a. As a general rule of 
notation we will use lowercase Greek letters to indicate degree values and lowercase 
Latin letters for node indices. In the equation above the sum of each row a of J is 
the number of connections involving nodes of degree a (i.e., all nodes in class Va). 
As each node of degree a has exactly a stubs the total number of nodes of degree 
a is given by the notal number of stubs from all nodes in class divided by a. 
Moreover, each edge between nodes of the same degree involves 2 stubs. Thus, the 
diagonal elements must be double-counted. Note that multiple JDMs can specify 
the same degree sequence and thus prescribing a JDM is more constraining than only 
prescribing a degree sequence. With the definitions above, the necessary and sufficient 
conditions for a JDM to be graphical can be stated as follows |481 [151150] : 


Theorem 1 (JDM graphicality). A symmetric A x A matrix J with non-negative 
integer elements is a graphical JDM if and only if: 


1 ) 

2 ) 

3) 


1141 is an integer VI 4 Qf 4 A, 

114 


Jfycy 4 


VI 4 o 4 A, and 


Jap^lVaWVpl VI 4 Qf,/3 4 A and a ^. 
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Figure 1. Graphical realizations of a simple JDM, given in Q. Panels (a) 
and (d) are degree class representations, while panels (b) and (e) are regular 
representations. The color of the edges indicates the subgraph they belong 
to. Panels (c) and (f) show the corresponding degree-spectra matrices for the two 
realizations; they differ in the bold red entries. 


It is important to observe that any graphical realization of a JDM can be 
decomposed into the disjoint union of a set of subgraphs that are bipartite (a ^ /3) 
with node sets Vh and and edges between them or unipartite (a = /3) with 
node set Vh and edges within that set. We are going to call such representation of 
a graphical realization a degree class representation. A simple example of a graphical 
JDM with = 10 and A = 4 is given by the matrix: 


/ 0 0 0 1 \ 

0 0 4 4 

0 4 13 

\ 1 4 3 0 / 


( 1 ) 


Panels (a) and (b) of Fig. show a graphical realization of J in degree class 
representation and regular representation, respectively. Panels (d) and (e) of the same 
figure show another realization of J in the two representations. The color of the edges 
indicate the subgraph they belong to. For example, G 24 is a bipartite graph between 
nodes of degree 2 (V2) and 4 (V4), respectively, having J2,4 = 4 edges drawn in green 
color, whereas G33 is unipartite with a single J33 = 1 edge drawn in blue. Note that 
while both graphical realizations have the same JDM, they are very different graphs. 
To see this, consider the counts of cycles C( of length i (a cycle is a closed path 
without repeated nodes). The graph in Fig.fHb) has = 1, 714 = 2, = 1, ng = 2, 

ut = 3 and ns = 3, whereas the one in Fig. [He) has ns = 1, 714 = 1, 775 = 2, Tig = 3, 
717 = 4 and rig = 1. 

Theorem [l] is an existence theorem, just like the Erdos-Gallai theorem for the 
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case of degree sequences, and as such it does not provide an algorithm that can 
generate simple graphs with a given JDM. More importantly, we also need an algorithm 
that does not exclude classes of graphical realizations of a given JDM, but that can 
construct in principle any such realization. The situation is similar to that of degree 
sequences. In that case the Havel-Hakimi method P [7] is always able to create a 
graphical realization of a graphical degree sequence, but cannot construct them all, 
i.e., there will be some realizations that can never be built by this algorithm. This 
was the reason for the introduction of the notion of star-constrained graphicality in 
Refs. [5^ [55] and the subsequent construction algorithms in Refs. PHI]. Here as 
well, we want to have a direct construction algorithm and ultimately an exact sampler 
that does not exclude any realization of a JDM. Due to the different nature of the 
constraints from the degree-sequence-based case, we need to develop a novel approach. 

The idea of the approach is based on the degree class representation above. Since 
the edges of the subgraphs Ga 0 are disjoint, we could build a graphical realization G 
of the JDM J by building all these subgraphs, while respecting the constraints. For 
a Gap subgraph we know its node set(s) and its total number of edges Jap- Consider 
then a node v € Va- We are not given its degree in Gap for any /?, but we know that 
the sum of its degrees within every one of these subgraphs must add up to a. For 
example, the sum of the numbers of the purple, green and red edges coming out of node 
2 in Fig. must add to 4. In addition, we also have the constraints that the sum of 
the degrees of one color of all nodes within Va must equal to the corresponding given 
JDM entry. Indeed, for example, the sum of all green edges in Fig. [^a) or Fig. m 
is J 2,4 = 4, for orange is 4, red is 3, etc. Thus, the idea of the algorithm is to first 
determine the degree of a given color respecting the constraints for all nodes and all 
colors, then use our methods introduced earlier PHD (see jAppendix A| to build the 
Gap subgraphs based on the corresponding degree sequences of their nodes. Different 
graphical realizations will be obtained from different assignments of color degrees and, 
of course, from the different graphical realizations of the same set of degrees. Note 
that for the bipartite subgraphs Gap we are specifying degree sequences for nodes in 
both partitions Va and Vp and thus we can use our graph construction method for 
directed graphs El, because a bipartite graph can be represented as a directed graph 
if nodes in one partition have only outgoing edges and in the other only incoming 
edges. In the following it will be useful to introduce the notion of degree spectra, 
representing the degrees of different colors of a node, as described above. 


2.2. Degree spectra 

Consider a single row a of a graphical JDM J. The information contained in the row 
determines the precise number of edges needed between nodes of degree a and nodes 
of every degree. In other words, of all the stubs coming from Va, Ja,i of them must 
end in a node of degree 1, Jq, 2 of them must end in a node of degree 2, and so on. 
However, these matrix elements do not specify how to distribute these edges within 
and between the degree classes. To better specify these connections one introduces 
the notion of the degree spectra, which can be conveniently represented as a matrix. 
The degree spectrum of a node is the sequence of its degrees towards all the degree 
classes, including its own degree class. A degree-spectra matrix S' is a A x N matrix 
whose {a,i) element Sai is the number of edges between node i and degree class a 
(the set of nodes of degree a). The i^^ column of S defines the degree spectrum 
of node i. Panels (c) and (f) of Fig. [^how two representations of the same JDM 
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Figure 2. Schematic for the partial degree sequence problem. 


given in Eq. [l] In general, there are many degree spectra matrices that correspond 
to the same JDM. As described in the previous section, we employ a two-step process 
in order to randomly sample graphs that realize a given JDM. First, we generate a 
random degree-spectra matrix from the JDM. Second, we construct a random graph 
that realizes the JDM and that obeys or is consistent with the chosen spectra matrix. 
This approach creates the need for a method to guarantee that the spectra generated 
from a JDM are graphical. 

The generation of a graphical degree-spectra matrix proceeds systematically, node 
by node. Therefore, at each step, some nodes will have an already fixed number of 
links within some of the subgraphs (links of a given color), while for the rest these 
numbers will not have been determined yet. Thus, at any time during this process we 
have a partial degree sequence of a bipartite graph. As the subgraphs must be simple 
graphs (realizable), one must be able to decide whether a partial bipartite degree 
sequence is graphical. The sufficient and necessary criterion for the graphicality of a 
partial bipartite degree sequence will be given in Theorem below. However, that 
will not necessarily mean that the whole JDM J is still realizable, in other words, how 
do we know that by guaranteeing the graphicality for a subgraph Gap we have not 
precluded graphicality of some other subgraph G^s-, f^nd ultimately of J? The answer 
to this question will be given by Theorem]^ later on. Together, these theorems form 
the basis for our algorithm to generate graphical degree spectra. 

Before proving a theorem that provides a graphicality test for partial bipartite 
degree sequences, we need to set some notations. Let A, B, H and K be four sets of 
nodes: 

A= {ai,02,---,a|^|} B = {bi,b2,- ■ ■ A\b\) with A n H = 0 
H ^ [hi,h 2 ,-■ ■ ,h\H\] K = [ki,k 2 ,-■ ■ ,k\K\] with iJ n AT = 0 

and let [7 = A VJ B and V = H U K (see Fig. [^. The sets can be of different 
size, but neither U nor V can be empty. Now, let V = {pi,P 2 , ‘ ’ ,P\a\} 

Q = {qijQ 2 , ■ ■ ■ ,q\H\} be two given sequences of integers. They will represent the 
partial bipartite degree sequences that have already been fixed by the algorithm up 
to that point. The degrees of the other nodes, specifically those in the sets B and 
K, are not yet specified. What is specified is the total number of edges £ in the 
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bipartition, i.e., the total number edges running between the sets U and V. Then, 
the partial bipartite degree sequence triplet {V, Q, e), hereafter simply called a triplet, 
is graphical if there exists a bipartite graph on U and V with £ edges and degree 
sequences 'D{U)\^ = V and 'D{V)\^ = Q. In other words, the bipartite graph must 
be such that the nodes in A have degree sequence V and those in H have degree 
sequence Q. The partial degree sequence problem is to decide whether one can choose 
the degrees of the nodes in the sets B and K such that the above constraints are 
satisfied and the bipartite degree sequence T) is graphical. 

Since the graph realizing a triplet is bipartite, the number of edges e equals the 
number of stubs in either set of nodes: 

\u\ Tl 

^ ^ ^ ^Ui - ^ ^ *ivi ■ 

i=l i=l 

The imposed partial sequences V and Q prescribe a certain number of stubs in the 
first |A| nodes of U and in the first \H\ nodes of V. Let these be P = 

Q = X]i=i ft) respectively. Then, the set B must contain exactly e — P stubs; similarly, 
the set K must contain exactly s — Q stubs. With these considerations, we first define 
the concept of a balanced realization of a triplet. Let /i = and v = A 

realization of a triplet is defined to be balanced if and only if the degree of any node 
in B is either [p,\ or [/i], and the degree of any node in K is either \ v\ or \v\. Notice 
that this means that \i p, or v are integers, then all the nodes in P or Lf must have 
exactly degree p or v, respectively. Conversely, if they are not integers, then the 
degrees of any two nodes in B or in K, respectively, can differ at most by 1. That 
is, a realization is balanced if and only if all the degrees of the nodes that one is free 
to choose (those in B and K) are as close as possible to their averages p and iz. The 
definition can be equivalently formalized by introducing a functional / acting on B 
and K: 

isi m 

f (B) ='^[\db^ - p\\ and / (iL) = ^ [|4. - • 

i=l i=l 

Then, a realization of a triplet is balanced if and only if both / (P) and / {K) vanish. 

An important theorem about the graphicality of triplets can now be proven. 
Theorem 2. The triplet {P,Q,e) is graphical if and only if it admits a balanced 
realization. 

Proof. Sufficiency is obvious. If the triplet admits any realization, balanced or not, it 
is graphical by definition. 

To prove necessity, suppose the triplet is graphical. Then, it admits a realization 
G. If G is balanced, then there is nothing to do. Conversely, if G is not balanced, 
then / (P), / (K), or both, are greater than 0 . Without loss of generality, assume that 
/ (P) > 0 . Then, there exists a node bi € B such that either db. < [p\ or dt, > \p']. 
Again without loss of generality, assume that db^ < lp\ (the other cases are treated 
analogously). Then, since the number of stubs within P is fixed, there must exist a 
node bj G P such that db > [/ij and thus db > db^. But then, there must exist a 
node Vk & V such that Vk is connected to bj but not to bi. Now, remove the edge 
{vk,bj) and replace it with {vk,bi). This yields a different realization with the same 
degrees for the nodes in V, and in which / (P) is decreased by at least 1, as the 
degrees of P moved towards the balanced condition. The procedure can be repeated 
until / (P) = 0 , resulting in a balanced realization. □ 
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A key consequence of this theorem is the following. 

Corollary 1. Let {V, Q,e) be a graphical triplet, and let x be a node in B or in K. 
If there is a realization of the triplet in which dx = a and another in which dx = /?, 
with a < P, then for all 7 with a "f ^ ft there exists a realization in which dx = 7- 

Proof. Without loss of generality, assume x G B. Then, there are several cases, 
each determined by the relative values of a, fi and [/rj. The most general case is 
a < [/ij < /3, so consider only this situation. Start from the realization with dx = (3. 
Repeated applications of the method in the proof of Theorem will eventually yield 
a realization in which dx = [/rj. For each step, the degree of x will have decreased 
by 1. Therefore, one realization of the triplet will have been found with dx = "f for all 
LmJ < 7 ^ /3. 

Now, start from the realization with dx = ol. Applying the same step from the 
proof of Theorem repeatedly will eventually yield a realization in which dx = [/J.J. 
For each of these steps, the degree of x will have increased by 1. Therefore, one 
realization of the triplet will have been found with dx = ct for all a ^ 7 ^ . □ 

Notice that, given a graphical triplet. Corollary also implies the existence of 
minimum and maximum allowed degrees for each node whose degree has not yet 
been fixed in that triplet (namely, in B and K). That is, a realization of the triplet 
exists with a node having either its minimum or maximum degree, or any degree 
between these two values. Of course, the value of the minimum and maximum degree 
will depend on which degrees have been fixed up to that point, so these need to 
be computed on the fly. How to calculate these degree bounds will be explained in 
Subsection 13.11 

2.3. Building a degree-spectra matrix 

Corollaryj^suggests the possibility of a direct, sequential way to build a degree-spectra 
matrix from a JDM. However, building the degree-spectra matrix node by node is a 
local process, which guarantees via Theorem only that the bipartite graph in which 
the node whose degree spectrum is being set resides is graphical. There is a global 
constraint, however, on every node, namely that the sum of their degree spectra must 
add up to the degree of the class they belong to. We have to make sure that the 
local construction process also respects the global constraints, i.e., it is feasible with 
it. The theorem below will show that this sequential construction process is feasible, 
and just as importantly, all graphical realizations of a JDM J can be constructed in 
this way, i.e., all graphical degree-spectra matrices can be obtained by this sequential 
construction process. 

Theorem 3. Let S be the subset of all the nodes with fixed spectra; then, there exists 
a realization of a JDM J consistent with the fixed spectra if and only if for every {a, /3) 
pair with a, f3 € {!,..., A} there exists a graph with Ja^p edges also satisfying 
the fixed spectra of S. 

Proof. Necessity is obvious. If there exists a realization of J satisfying the spectra, 
then each subgraph between any pair of degree classes both satisfies the spectra and 
has the right number of edges. 

To prove sufficiency, assume that we have a fixed degree spectrum for all the 
nodes in S and we have guaranteed the graphicality of all the subgraphs Gap- They 
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have the right number of edges Ja^g and their nodes satisfy the fixed spectra specified 
in the subset S. Since we have guaranteed graphicality for all the Gap subgraphs with 
these constraints, let us consider some graphical realization for each such subgraph 
and consider their union graph G. If the “free” nodes, i.e., those without a fixed 
spectrum, have all the correct degree in G (i.e., every node v G Va has dy = a for all 
a), then there is nothing to do. Now, assume they don’t. Since the total number of 
edges in each Gap is correct by hypothesis, there must exist a degree a and two free 
nodes v and w belonging to tfy such that dy < a and dyj > a. Thus, there must exist 
a node u connected to w but not to v. Then, erase the edge (u,w), and replace it with 
{u,v). This leaves the numbers of edges in all Gap unchanged, and does not change 
the degree spectrum of u, because v and w belong to the same degree class. Repeating 
this procedure results eventually in all the nodes having the correct degree. □ 

Theorem is fundamentally important as it justifies a systematic, node-by-node 
approach in building a graphical degree-spectra matrix. In fact, so long as one 
guarantees the possibility of subgraphs with the correct number of edges, a partial 
degree-spectra matrix maintains the graphicality of the JDM. 

The only detail left is specifying how to choose the numbers that form the degree 
spectra. Fortunately, this is straightforward. As mentioned in the previous Subsection, 
an implication of Corollary is the existence of minimum and maximum allowed 
degrees for nodes in partial degree sequences. Let them be m (minimum) and M 
(maximum). But a partial degree sequence is nothing else than a partially built 
degree spectrum, if one recognizes the node sets U and V as two degree classes. Then, 
a condition that must be satisfied in building a degree-spectra matrix is that any new 
number chosen to augment a partially built degree spectrum has to be within these 
bounds. However, one must also consider that if a node belongs to a certain degree 
class, it must have the correct total degree. 

To state both conditions, assume the degree spectrum of node u S Ify is being 
built. Let T be the set of degree classes for which a spectrum element has already 
been chosen, and let Spy be the element to determine next. Then, a valid value k for 
Spy must satisfy the two conditions 

mp ^ k ^ Mp (2) 

a-k-'^Snv^ M^. (3) 

r;^(ru/3) jycr r]^{rup) 

Below, in Subsection |3.I| we describe how to compute the min and max values for 
degree spectra elements. 

3. The algorithm 

3.1. Description 

We are now ready to describe our JDM sampling algorithm. The algorithm is 
composed of two parts. The first is a spectra sampler that randomly generates degree- 
spectra matrices from a graphical JDM J: 

(i) Initialize i = 1. 

(ii) Set a = I. 

(iii) Let I be the number of the residual, unallocated stubs of node i. If Z fy 0: 
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Gap 



• : fully fixed degree 
spectrum 

O : partially fixed degree 
spectrum 



O: “free” nodes 


balanced 


Figure 3. Sequentially determining graphical degree spectra consistent with a 
given JDM J. 


(a) If + 0: 

1. For all ct ^ ^ ^ A, if Jdi ,/3 0, find and Wfc; otherwise, set 

mk = Mk = 0 . 

2. Compute t = ^nd T = E^=a+i ^P- 

3. Find the actual minimum and maximum allowed for the degree-spectrum 
element: r = max {toq,, I — T'\ and R — min {Mq,, I — t}. 

4. Extract an integer Sa,i uniformly at random between r and R. 

(b) Increase ct by 1, and go to step (iii). 

(iv) Increase J by 1. If j ^ iV, go to step (ii). 


To find the values of m and M in step (iii).a.l above, consider the degrees of 
the nodes belonging to Vq and Vp in Gap- In the formalism of Subsection 2.2 the 


already fixed spectra elements are equivalent to the sequences V and Q. Then, to test 
the viability of a given value as a degree-spectrum element, assign it to the element 
being determined, complete the degree sequence making it balanced, and test it for 
graphicality, see Fig. If the sequence is graphical, then the triplet has a balanced 
realization, which by Theoremj^is a necessary and sufficient condition for the existence 
of a subgraph corresponding to the spectrum element being determined. If Gap is 
unipartite, the graphicality test can be done using the fast method described in [3]. 
The situation is marginally different if Gap is bipartite. In this case, as previously 
mentioned, the degree sequence can be built as a BDS in which nodes of degree a only 
have incoming edges, and nodes of degree j5 only have outgoing ones. This sequence 
can then be tested with the fast directed graphicality test described in m- 

Thus, to find the minimum value m one can simply run a sequential test, checking 
for valid spectrum values from 0 onwards. The first successful value is m. Then, to 
find M, use bisection to test all the values from m -I- 1 to the theoretical maximum, 
looking for the largest number allowed. Clearly, the theoretical maximum at that 
stage is the degree of the class the node belongs to minus the sum of the already fixed 
spectra values for that degree. 

These considerations also clarify the nature of the second part of the algorithm, 
which samples realizations of the JDM from an extracted degree spectra matrix. 
Summarizing, 


• JDM realizations can be decomposed into a set of independent unipartite and 
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bipartite graphs. 

• The degree spectra define the degree sequences of the component subgraphs. 

Then, to accomplish the actual sampling, extract the degree sequences from the 
degree spectra and use them in the graph sampling algorithms for undirected and 
directed graphs presented in ISIII] and in here inf 


Appendix A Every time a sample 


is generated, it constitutes a subgraph of a JDM realization. All that is needed in the 
end is simply to list the edges correctly, since the graph realizing the JDM is the union 
of all the unipartite and bipartite subgraphs into which it has been decomposed. 


3.2. Sampling weights 


Our algorithm does not extract all degree-spectra matrices from a JDM with the same 
probability. However, the relative probability for the extraction of each spectra matrix 
is easily computed, and it can be used to reweight the sample and obtain unbiased 
sampling. If every new element of a degree-spectra matrix is extracted uniformly at 
random between r and R, its probability of being chosen is simply . Therefore, 

the probability of extracting a given spectra matrix S is p {S) = R-r+i ’ ’"^here 

m is the total number of elements extracted. Then, an unbiased estimator for a 
network observable Q on an ensemble of Z spectra matrices can be computed using 
the weighted average 


(Q) 


Y.f=i 'Wi 


(4) 


In the expression above, Qi is the value that Q assumes on the i^^ sampled matrix. 
Indicating by rj and Rj the values that r and R assume for the matrix element 
extracted, the weights are 


= n ~ ■ 

1=1 


(5) 


Of course, besides the spectra matrix, every subgraph has its own sampling 
weight. Thus, the total weight of a single JDM sample is the product of the 
corresponding spectrum weight and all the subgraph weights. To describe the 
distribution of the sample weights, first recall that the individual subgraph weights 
are log-normally distributed mn]. Thus, as the sample weights are their product, we 
expect them to be log-normally distributed too. Also, for large JDMs, where ^ 1, 
the m factors in Eq.f^are effectively random. Thus, our expectation is that the spectra 
weights are log-normally distributed as well. To verify this, we extracted the JDM of 
a random scale-free network with 1000 nodes and power-law exponent of 2.5, and used 
it to generate an ensemble of 10® degree spectra matrices and one of 10® JDM samples 
of a single spectra matrix. Figure shows that the histograms of the logarithms of 
spectra matrix weights and sample weights are well approximated by a Gaussian fit, 
supporting our assumptions. 

A simple and small example is provided in [Appendix B| There, we analytically 
compute the JDM ensemble averages of the local clustering coefficients of nodes of all 
degrees, based on unweighted sampling and also based on weighted sampling, with the 
weights provided by the algorithm. In table |B1| we show the results of simulations 
using our algorithm, taking into account the sample weights (as described above), 
and simply computing the averages of the clustering coefficients over the samples 
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Figure 4. Log-normal distribution of weights. The top panel shows the histogram 
of the natural logarithms of the weights for an ensemble of 10® degree-spectra 
matrices; the bottom panel shows the histogram for an ensemble of 10® sample 
weights. Both distributions (solid black lines) are well fitted by a Gaussian curve 
(dashed red line). 


generated. The results between theoretical and simulated measures agree very well. 
The differences between weighted and unweighted versions can be also appreciated, 
and while they are small in this example, they are measurable and need to be taken 
into account in general. 

3.3. Computational complexity 

To determine the computational complexity of the algorithm, first note that the main 
cost in creating a spectra matrix comes from the repeated graphicality tests. Let A 
be the number of non-empty degree classes in the JDM 

^=|{a: ya^0}| . 

Then, for each of the NA non-trivial elements in the degree-spectra matrix, A tests 
are needed, each with a computational complexity of the order of the number of nodes 
in the corresponding degree class. Thus, the total computational complexity for the 
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spectra construction part of the algorithm is 


Cs = 0 


A A 


^EE 1^/31 


a—l B—Oi 


( 6 ) 


Notice that in our treatment one is free to choose the order of the degree classes. Thus, 
to minimize the complexity, one can simply determine the degree-spectra elements 
in descending order of degree class size. Then, the worst case corresponds to the 
equipartition of the nodes amongst degree classes, |Vq| = In this case, it is 

Cs = 0 = O (N^A) , 

which reduces to 


Cs = 0 (N^) 

if the number of degree classes is of the same order as the number of nodes. 

A more precise estimate for a given JDM can be obtained by rewriting Eq. as 

( A A 

a—1 ^—a 

where the degree distribution P (d) = | Vd| /N is the probability that a randomly chosen 
node has degree d. It is easy to see, then, that the worst case is unlikely to occur. 
Consider for instance systems of widespread insterest, such as scale-free networks, for 
which P (d) ^ d~^ with 7 > 2. Then, in the limit of large networks, the equation 
above becomes 

Cs = o(^N^J^ dxj dk - 1) =0{N^) . 

Thus, in this case, the complexity leading order for spectra matrix extraction is only 
quadratic. 

Given a degree-spectra matrix, to construct a JDM realization one then needs to 
build O (A^) subgraphs, each with O (^) nodes and O edges. For each subgraph, 
the computational complexity is of the order of the number of nodes multiplied by the 
number of edges. Thus, the total sampling complexity is = 0{NM). 

Therefore, the total complexity of the graph construction part of our method is 
O {N"^) for sparse networks, and O for dense ones. Once more, we do not expect 
the worst case complexity to occur often. For example, in the already mentioned 
case of scale-free networks, which are always sparse m, the total complexity of 
our algorithm would only be quadratic. A less efficient sampling method has been 
developed recently jH], but it is based on backtracking, producing results containing 
biases that are uncontrolled and that cannot be estimated. 



4. Conclusions 

In summary, we have solved the problem of constrained graphicality when degree 
correlations are specified, developing an exact algorithm to construct and sample 
graphs with a specified joint-degree matrix. A JDM specifies the number of 
edges that occur between degree classes of nodes (nodes of given degrees), and 
thus completely determines all pairwise degree correlations in its realizations. Our 
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algorithm is guaranteed to successfully build a random JDM sample in polynomial 
time, systematically, and without backtracking. It is also guaranteed to be able 
to build any of the graphical realizations of a JDM. Each graph is constructed 
independently and thus there are no correlations between samples. Although the 
algorithm does introduce a sample bias, the relative probability for the construction 
of each sample is computable, which allows the use of weighted averages to obtain 
unbiased sampling (importance sampling). However, importance sampling is only 
exact in the limit of an infinite number of samples. This raises the issue of convergence. 
The lognormal distribution of weights makes convergence slow, but for small- to 
medium-sized networks good accuracy can be achieved, and quantities computed as if 
from uniform sampling. Improving the speed of convergence is a challenging problem, 
partly because it depends on the constraining JDM, and will be addressed in future 
publications. 

Degree correlations in real-world systems have been widely observed. Social 
networks are known to be positively correlated, and the concept of assortativity was 
known to the sociological literature before it was employed in applied mathematics. 
Technological networks are also characterized by particular correlation profiles. 
Moreover, correlations significantly affect the dynamics of spatial processes, such as 
the spread of epidemics |3]. Thus, with our algorithm, one can model correctly complex 
systems of general interest with desired degree assortativity. For the first time, this 
enables the study of networks in which the correlations are not determined solely 
by the nodes’ degrees. For instance, there exist many studies about social networks, 
consisting of a comparison between some specific real-world network and a randomized 
ensemble of networks with the same degree sequence or degree distribution. As social 
networks are scale-free, these studies often just sample the same sequence or the same 
type of power-law sequences to produce null-model results. However, social networks 
are assortative, while random scale-free networks are on average disassortative. Thus, 
the average correlations of scale-free networks make degree-sequence and degree- 
distribution sampling problematic if one is trying to consider a random model of 
a social network. Our method allows one to avoid this problem by directly imposing 
the correlations, rather obtaining only those imposed by the degree sequence. 

Upper bounds on the computational complexity of our algorithm show that in 
the worst case it is cubic in the number of nodes. However, we provide a way to 
compute the expected worst-case complexity if the degree distribution of the networks 
considered is known. This shows that, for commonly studied cases such as scale-free 
networks, the maximum complexity is only of the order of N'^, making the algorithm 
even more efficient. 
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Appendix A. Direct construction of random directed and undirected 
graphs with prescribed degree sequence 

In order to fully describe our algorithm for sampling graphs with prescribed degree 
correlations, we include in this appendix succinct descriptions of our algorithms for 
sampling random undirected [S] and directed la graphs with a prescribed degree 
sequence. Both are used in our algorithm to sample graphs with a prescribed JDM, 
and both work by directly constructing the graphs. So long as the prescribed degree 
sequence is graphical, both algorithms are guaranteed to successfully construct a 
graph without backtracking. They accomplish this by building the graph an edge 
at a time, connecting pairs of stubs, maintaining the graphicality of the residual 
stubs throughout the construction process. The algorithms make use of our fast 
methods for testing the graphicality of degree sequences, which are also described 
below. The worst case complexity is O (N) for the graphicality tests, and O {NM) 
for both sampling algorithms. Both algorithms generate biased samples, but we also 
state the relative probability of generating a sample, which can be used to calculate 
unbiased statistical averages. See our previous publications for proof of the correctness 
of these algorithms mui]; they are stated without proof or detailed explanation here. 

Appendix A. 1. Undirected graphs 

A nonincreasing sequence of integers T) = {di, di,..., d]\[} is graphical if and only if 
Sfci di is even, and Lk ^ Rk for all 1 ^ A: < A, where Lk and Rk are given by the 
recurrence relations 

Li = di 

Lk = Lk-i + dk 

and 

Ri=N-l (A.3) 

Rk-i+Xk-'i yk < k* .. . 

Rk-i + 2{k - 1) - dk yk^k* ^ 

and we defined the crossing indices Xk = min {i : di < k}, and k* = 
min{* : Xi < i + 1}. Thus, to test the graphicality of V: 

(i) Sum the degrees to determine if even. If false, then stop; V is not 

graphical. If true, continue. While summing the degrees, also calculate the 
crossing indices Xk for each k and determine k*. 

(ii) Test if Li < = A — 1. If false, then stop; T> is not graphical. If true, set k = 2 

and continue. 

(iii) Test if Lk < Rk- If false, then stop; V is not graphical. If true, increase k by one 
and repeat. Continue until k = N — 1, then stop; V is graphical. 

Given a nonincreasing graphical degree sequence V, a random undirected graph 
that realizes V can be constructed by: 

(i) To each node, assign a number of stubs equal to its degree. 

(ii) Choose a hub node i. Any node can in principle be chosen, for example, the node 
with the largest degree. 

(iii) Create a set of forbidden nodes X, which initially contains only i. 



(A.l) 

(A.2) 
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(iv) Find the set of allowed nodes A to which i can be linked preserving the graphicality 
of the remaining construction process. To find A, first determine the maximum 
fail degree k using the method described below. Then A will consist of all nodes 
j ^ X that have remaining degree greater than k. 

(v) Choose a random node m € A and connect i to it. 

(vi) Reduce the value of di and dm in I? by 1, and reorder it. 

(vii) If TO still has unconnected stubs, add it to the set of forbidden nodes X. 

(viii) If i still has unconnected stubs, return to step (iv). 

(ix) If nodes still have unconnected stubs, return to step (ii). 

To determine the maximum fail degree in a degree sequence V being sampled, 
build the residual degree sequence T)', by connecting the hub node i with remaining 
degree di to the di — 1 nodes with the largest degrees that are not in the forbidden set 
X and reducing the elements of V accordingly. Then, compute the graphicality test 
inequalitites. Each inequality potentially yields a fail-degree candidate, depending on 
the values of and Rk- For each value of k there are only 3 possibilities: 

(a) Lk = Rk 

(b) Lfc = — 1 

(c) Lk ^ Rk — 2 

In case (a), the degree of the first non-forbidden node whose index is greater than k is 
the fail-degree candidate. In case (b), the degree of the first non-forbidden node whose 
index is greater than k and whose degree is less than fc -|-1 is the fail-degree candidate. 
In case (c), there is no fail-degree candidate. The sequence of candidate nodes is 
non-decreasing until the fail-degree is found. Thus, one can stop the calculation when 
either the current fail-degree candidate is less than the previous one, or when a case (a) 
happens. 

This algorithm generates graph samples biasedly. However, the relative 
probability of generating a particular sample /i is 

m di ^ 

Pu = Yld^\YlY—, (A.5) 

z=l j = l 


where di is the residual degree of node i when it is chosen as a hub, to is the total 
number of hubs used, and Ai^ is the allowed set for the link of hub i. Thus, an 
unbiased estimator for a network observable Q for any target distribution P is the 
weighted average 


(Q) 




(A.6) 


where M is the number of samples and w^. = For uniformly sampling the 

networks, P is constant and it cancels out of the formula. 


Appendix A.2. Directed graphs 

A bi-degree sequence (BBS) D = {{df ,df) , {df , dj) > • • • i {dfj, d '^')} of integer pairs, 
ordered so that the first elements of each pair form a non-increasing sequence, is 
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graphical if and only if = Sti ^ < -Rfe for all 1 < fc < iV — 1, 

where and R}. are given by the recurrence relations 


Li — d^ 

Lk = Lk-i + dk 


(A.7) 

(A.8) 


and 


R^= N -l-Gi{Q) 


f Rk-i + N — Gk-i (k 
\ Rk-i + N — Gk-i (k 


and Gk and Gk are defined as follows. Let 


9i (k) 


df + 1 Vi < /c 
Wi > k 


Then 


N 

Gk ijp) ^ ^ ’ 

i=l 


1) Vd+ < k 

1 ) - 1 yd+^k 


(A.9) 

(A.IO) 


(A.ll) 


(A. 12) 


where S is the Kronecker delta, and G is given by the recurrence relation 

Gi(l) =Gi(0) + Gi(l) (A.13) 

Gk (k) = Gk-i (fc - 1) + Gi (k) + S{k) , (A.14) 


where 

k — 1 k 

^ (k) = '^^Skd++i—'^^Skd+ ■ (A.15) 

t=2 t=2 

To efficiently test the graphicality of a BDS T>, 

(i) Sum the in- and out-degrees to determine if df ■ If false, then 

stop; T> is not graphical. If true, continue. While summing the degrees, also 
calculate Lk for each k. 

(ii) Compute Gi (fc) for each k. 

(iii) Compute S (fc) for all k: 

(a) Initialize S (k) to 0 for all k. Set i = 2. 

(b) If ^ i, decrease S (dll) by 1- 

(c) If d+ + 1 > i, increase 5(d+ + l) byl. 

(d) Increase i by 1. If i ^ iV, repeat from step (b). 

(iv) Test if Li < Ri. If false, then stop; T> is not graphical. If true, set k = 2 and 
continue. 

(v) Test if Lk < Rk- If false, then stop; T> is not graphical. If true, increase k by one 
and repeat. Continue until k = N — 1, then stop; T> is graphical. 

Given a graphical BDS of integer pairs V in lexicographic order, a random directed 
graph that realizes T> can be constructed by 

(i) Assign in-stubs and out-stubs to each node according to its degrees. 

(ii) Define as current hub the lowest-index node i with non-zero out-degree. 

(iii) Create a set of forbidden nodes X, which initially contains i and all nodes with 
zero in-degree. 
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(iv) Find the set of allowed nodes A to which an out-stub of i can be connected without 
breaking graphicality. To find A, first determine the maximum fail in-degree k 
using the method described below. Then A will consist of all nodes j ^ X that 
have remaining in-degree greater than k. 

(v) Choose a random node m € A and connect an out-stub of i to one of its in-stubs. 

(vi) Reduce the value of df and in I? by 1, and reorder it accordingly. 

(vii) Add m to the set of forbidden nodes X. 

(viii) If i still has unconnected out-stubs remaining, return to step (iv). 

(ix) If nodes still have unconnected out-stubs, return to step (ii). 

The following simple procedure can be used to efficiently find the fail-in-degree 
in step (iv) of the sampling algorithm. 


(i) Create a new BDS V obtained from T> by reducing the in-degrees of the first 
df — 1 non-forbidden nodes by 1, and reducing the out-degree of * to 1. 

(ii) If j = 1, set fc = 2; otherwise, set k = 1. 

(iii) Compute Lk and Rk of the BDS I?'. 

(iv) If Lk ^ Rk- increase A: by 1; if fc = N, there is no fail-in-degree, and all the 
non-forbidden nodes are allowed, so stop; otherwise, go to step (iii). 

(v) Find the first non-forbidden node in V whose index is greater than fc. 

(vi) Identify this node in the original BDS T>. Its in-degree is the fail-in-degree. Stop. 


As in the case of the sampling algorithm for undirected graphs, this algorithm 
generates directed graph samples biasedly. However, an unbiased estimator for a 
network observable Q for any target distribution P is the weighted average given by 
Eq. A.6 In this case the weights are 


w,, = 


nni 

i=ij=i 


A,, 


(A.16) 


where ly is the total number of hubs used, \Ai^ | is the size of the allowed set immediately 
before placing the j**' connection coming from the hub, and df is the out-degree 
of the node chosen as a hub. Note that, unlike the case for undirected networks, 
there is no factorial combinatorial factor in the weights. This is because while the 
particular sequence of hub nodes chosen depends on the links placed, every node with 
non-zero out-degree will be selected, sooner or later, as the hub. Therefore, all the 
samples produced would have an extra, identical, multiplicative factor of 11^ i 
As only the relative probabilities are needed for estimating an observable, and this 
factor is the same for every possible sample, it is eliminated from the formula for the 
weights. 


Appendix B. An explicit example 

To illustrate the sampling mechanism and the difference between weighted and 
unweighted estimation, we consider the realizations of the JDM 

/ 0 0 0 \ 

J= 0 2 4 , (B.l) 

VO 4 ij 
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"Pentagon" graph 


"Hexagon" graph 




"Bow tie" graph 


Figure Bl. Possible realizations of the JDM in Eg. |B.l| up to isomorphism. 


and explicitly compute the average local clustering coefficients (c^) of the nodes of 
degree d, for all values of d. This JDM induces the degree sequence V — {2, 2, 2, 2,3, 3}, 
and, up to isomorphism, has only three possible realizations, shown in Fig. |B1[ From 
the figures, it is easy to see that, for the pentagon graphs, { 02 )p = 1/4 and { 03 )p = 1/3. 
Also, for the hexagon graphs { 02 )p = { 03 )p = 0, while for the bow tie graphs { 02 )p = 1 
and {c 3 )p = 1/3. 

Appendix B.l. Unweighted estimate 

To calculate the theoretical results for the unweighted case, we need to consider the 
probability with which our algorithm generates each degree-spectra matrix from J. 
To this purpose, first note that there are several degree-spectra matrices whose 
realizations are all pentagon graphs. Also, all the hexagon and bow tie graphs have 
the same degree-spectra matrix 

/ 000000 \ 

= 1 1 1 1 2 2 . (B.2) 

\ 1 1 1 1 1 1 / 

This allows us to compute just the probability of generating S, as all the other matrices 
will yield the same contribution to ( 02 )p and (03) p. 

Our method chooses the elements of the degree-spectra matrix S being created in 
a systematic way, node by node. As there are no nodes of degree 1, all the element in 
the first row of the matrix are fixed to 0. Then, the first element to choose is 82 ^, that 
is, the number of edges between node 1 and nodes of degree 2. The possible choices for 
this element are 0, 1, and 2. Choosing 0 or 2 will result necessarily in a degree-spectra 
matrix whose realizations are all pentagon graphs. In fact, from Fig. |Bl| one can see 
that, amongst the realizations of J, the pentagon graphs are the only ones in which a 
node of degree 2, such as node 1, has either no edges or 2 edges with nodes of degree 2. 
Thus, choosing the value of S' 2,1 with uniform probability, at this stage one generates 
pentagon graphs with probability 2/3. 
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The remaining choice, iS' 2,1 = 1, happens with probability 1/3. In this case, 5 ' 3 ^i 
is forced to be 1, since the elements in the first column of S must sum up to the degree 
of the node 1, which is 2. The next element to determine is then S' 22 . Similarly to the 
previous case, the possible values are 0, 1, and 2. Choosing 0 or 2 will always result in 
pentagon graphs, whose probability of being generated increases by 1/3 • 2/3 = 2/9. 

Choosing S' 2,2 = 1, which occurs with total probability 1/3 • 1/3 = 1/9, forces 
83,2 = 1- The next value to determine is that of S' 2 , 3 . As before choosing 0 
or 2 yields pentagon graphs, whose total probability of being generated increases 
by 1/3-1/3-2/3 = 2/27. 

The choice of S 2.3 = 1, which has a total probability 1/3 • 1/3 • 1/3 = 1/27 of 
happening, implies that S' 2,3 = 1. Then, the degree-spectra matrix being built can 
only be Shb- In fact, as it is evident from Fig. |B1| the only graphs realizing J in 
which at least 3 nodes of degree 2 are linked exactly to one other node of degree 2 and 
one of degree 3, are hexagon and bow tie graphs. 

This shows that the degree-spectra matrix Shb occurs with probability 1/27; 
conversely, degree-spectra matrices yielding pentagon graphs occur with probability 
26/27. 

The next step in our evaluation is to compute the probabilities of generating 
any of the hexagon and bow tie graphs from the degree-spectra matrix Shb- The 
graph-construction part of our algorithm consists in generating all the GafS subgraphs 
between nodes of degree a and nodes of degree j3. In the current example, there are 
three such subgraphs, namely G' 2 , 2 , G 2 , 3 , and 03 ^ 3 . Of these, 03^3 consists simply in 
a single edge between the two nodes of degree 3. Thus, the only variability is given 
by the choices for the two remaining subgraphs. 

The possible realizations of G 2,2 are illustrated in panels (a), (b) and (c) of Fig. 


B2 


Each is determined by the placement of a single edge, which forces the choice for the 
remaining one. Thus, each is produced by our algorithm with the same probability 
of 1/3. Similarly, each of the possible realizations for 6 * 2 , 3 , shown in panels (d) to (i) 
of Fig. |B2[ is determined by the edges incident to node 5 or node 6 . As these are 
chosen by our algorithm fully randomly, all the possible realizations occur with the 
same probability of 1/6. The particular type of graph that is produced depends on 
the specific realizations of the subgraphs. As there are 3 realizations for 62,2 and 
6 for G 23 , the total number of graphs is 18. Of these, 1/3 are bow tie graphs, and 
the remaining 2/3 are hexagon graphs. In particular, the bow tie graphs correspond 
to the subgraph choices (a,d), (a,i), (b,e), (b,h), (c,f) and (c,g), as it is easy to see 
from Fig. |B2| Note that this indicates that, for this specific degree-spectra matrix, 
the sampling is already uniform. 

It is possible, now, to compute the average clustering coefficients for the 
unweighted estimation. To do so, first compute their average over the realizations 
of S' 


HB- 


('^2)hB — o ■ ^ T 


(^3)hB — 


0 = 


1 


0 = 


(B.3) 

(B.4) 


Then, knowing that Shb is sampled with probability 1/27, and the remaining degree- 
spectra matrices always yield pentagon graphs, it is 

unweighted 2y 27 4 162 ' 
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Figure B2. Degree-class subgraphs realizing the degree-spectra matrix Shb of 
Eg. |B.2[ Panels (a), (b) and (c) show the possible realizations of G 2 , 2 ; panels (d) 
to (i) show the possible realizations of G 2 , 3 - 


(ca) 


unweighted 
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Appendix B.2. Weighted estimate 

In order to obtain an analytical result for the weighted estimate, rather than computing 
the probability of occurrence of each degree-spectra matrix as sampled by our 
algorithm, we need to compute their actual number. From the previous subsection, 
we already know that all the hexagon and bow tie graphs come from the same degree- 
spectra matrix Shb, which is unique. Then, we only need to compute the number of 
degree-spectra matrices corresponding to pentagon graphs. 

To do so, remember that the first choice in the construction of a degree-spectra 
matrix from J is the value of the element S' 2 , 1 . If 52,1 = 0 or 52 ,i = 2, then we are 
guaranteed to get a pentagon graph. However, while each of these two choices fixes 
the value of 53 , 1 , we are still free to select a value for the next “free” element, S 2 , 2 - 
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Figure B3. Decisional tree for the construction of degree-spectra matrices 
realizing J and corresponding to pentagon graphs. The 12 leaves of the tree 
are shown in red. 


If <S' 2 ,i = 0, the allowed values for S' 2,2 are 1 and 2. Choosing 2 fixes all the other 
elements of the degree-spectra matrix. Conversely, choosing 1 results in 5'2,3 still to be 
determined. Its possible values are 1 and 2. Thus, there are 3 different degree-spectra 
matrices with S' 2,1 = 0. 

If, instead, 82 ^ = 2, the situation is very similar to the first case. The possible 
choices for 82,2 are 0 and 1. Choosing 0 fixes the entire matrix; choosing 1 requires 
to select a value for S' 2 , 3 , which can be either 0 or 1. Thus, there are 3 matrices with 
52,1 = 2 . 

The third possibility of ^ 2,1 = 1 still allows degree-spectra matrices corresponding 
to a pentagon graph. Similarly to the previous case, the simplest way to construct 
one is to impose 82.2 = 0 or 82^2 = 2. In both cases, one must then choose a value for 
52,3. The possibilities are 1 and 2, if 82,2 = 0, or 0 and 1, if 82.2 = 2. Any choice for 
52,3 fixes all the remaining elements of the matrix. 

Finally, it is still possible to choose 52 ,1 = 1 and 82,2 = 1, and still construct 
matrices corresponding to a pentagon graph. The choice is again on 82,3- Choosing 
52,3 = 0 or 52,3 = 2 fixes all the other elements of the matrix, whose realizations will be 
pentagon graphs. Imposing 82,3 = 1, instead results in the matrix 8 hb, exhausting 
all possibilities. This shows that there are 6 different matrices with 52 ,1 = 1 that 
generate pentagon graphs. 

The decisional tree we just described is shown in Fig. |B3| as a visual aid. In 
summary, there are 12 different degree-spectra matrices that realize J and whose 
realizations are always pentagon graphs. Knowing (c 2 )p, ( 03 )p, (c 2 )pp and 
which we computed before, we can finally calculate the weighted average clustering 
coefficients: 
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Table Bl. Comparison between analytical and simulated averaged local 
clustering coefficients. 


Coeff. 

Theor. unweigh. 

Simul. unweigh. 

Theor. weigh. 

Simul. weigh. 

C2 

0.25309 

0.25320 

0.25641 

0.25664 

C3 

0.32510 

0.32483 

0.31624 

0.31570 


Appendix B.3. Numerical verification 


To validate our algorithm against the analytical results presented in the two 
subsections above, we performed extensive numerical simulations, generating 10^ 
degree-spectra matrices, and 10^ samples per matrix, for a total of 10^ graphs. For 
each graph generated, we saved the average local clustering coefficients for nodes of 
both degrees. Then, we obtained both weighted and unweighted results by averaging 
the data first naively, and then with a proper use of the weights according to Eq. 
The results, shown in Table Bl show that the weighted averages obtained using our 
algorithm converge to the correct result. Also, the difference between weighted and 
unweighted results can be appreciated even when it is quite small, as in our example. 
This illustrate the sensitivity of our method, as well as the necessity of using proper 
sampling when performing this kind of studies. 
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